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Abstract 

Tor is vulnerable to network congestion and performance 
problems due to bulk data transfers. A large fraction of 
the available network capacity is consumed by a small 
percentage of Tor users, resulting in severe service degra- 
dation for the majority. Bulk users continuously drain 
relays of excess bandwidth, creating new network bot- 
tlenecks and exacerbating the effects of existing ones. 
While this problem may currently be attributed to ratio- 
nal users utilizing the network, it may also be exploited 
by a relatively low -resource adversary using similar tech- 
niques to contribute to a network denial of service (DoS) 
attack. Degraded service discourages the use of Tor, af- 
fecting both Tor’s client diversity and anonymity. 

Equipped with mechanisms from communication net- 
works, we design and implement three Tor-specihc al- 
gorithms that throttle bulk transfers to reduce network 
congestion and increase network responsiveness. Unlike 
existing techniques, our algorithms adapt to network dy- 
namics using only information local to a relay. We exper- 
iment with full-network deployments of our algorithms 
under a range of light to heavy network loads. We find 
that throttling results in signihcant improvements to web 
client performance while mitigating the negative effects 
of bulk transfers. We also analyze how throttling affects 
anonymity and compare the security of our algorithms 
under adversarial attack. We hnd that throttling reduces 
information leakage compared to unthrottled Tor while 
improving anonymity against realistic adversaries. 

1 Introduction 

The Tor [19] anonymity network was developed in an 
attempt to improve anonymity on the Internet. Onion 
Routing [23,48] serves as the cornerstone for Tor’s over- 
lay network design. Tor clients encrypt messages in sev- 
eral “layers” while packaging them into 5 12-byte packets 
called cells, and send them through a collection of relays 
called a circuit. Each relay decrypts its layer and for- 
wards the message to the next relay in the circuit. The 
last relay forwards the message to the user-specihed des- 
tination. Each relay can determine only its predecessor 
and successor in the path from source to destination, pre- 
venting any single relay from linking the sender and re- 


ceiver. Clients choose their first relay from a small set 
of entry guards [44, 59] in order to help defend against 
passive logging attacks [58]. Traffic analysis is still pos- 
sible [8,22,28,30,39,42,46,49], but slightly complicated 
by the fact that each relay simultaneously services mul- 
tiple circuits. 

Tor relays are run by volunteers located throughout 
the world and service hundreds of thousands of Tor 
clients [37] with high bandwidth demands. A relay’s 
utility to Tor is dependent on both the bandwidth ca- 
pacity of its host network and the bandwidth restrictions 
imposed by its operator. Although bandwidth donations 
vary widely, the majority of relays offer less than 100 
KiB/s and may become bottlenecks when chosen for a 
circuit. Bandwidth bottlenecks lead to network conges- 
tion and impair client performance. 

Bottlenecks are further aggravated by bulk users, 
which make up roughly five percent of connections and 
forty percent of the bytes transferred through the net- 
work [38]. Bulk traffic increases network-wide conges- 
tion and punishes interactive users as they attempt to 
browse the web and run SSH sessions. Bulk traffic also 
constitutes a simple denial of service (DoS) attack on 
the network as a whole: with nothing but a moderate 
number of bulk clients, an adversary can intentionally 
signihcantly degrade the performance of the entire Tor 
network for most users. This is a malicious attack as op- 
posed to an opportunistic use of resources without regard 
for the impact on legitimate users, and could be used by 
censors [16] to discourage use of Tor. Bulk traffic ef- 
fectively averts potential users from Tor, decreasing both 
Tor’s client diversity and anonymity [10, 18]. 

There are three general approaches to alleviate Tor’s 
performance problems: increase network capacity; opti- 
mize resource utilization; and reduce network load. 
Increasing Capacity. One approach to reducing bottle- 
necks and improving performance is to add additional 
bandwidth to the network from new relays. Previous 
work has explored recruiting new relays by offering per- 
formance incentives to those who contribute [32,41,43]. 
While these approaches show potential, they have not 
been deployed due to a lack of understanding of the 
anonymity and economic implications they would im- 
pose on Tor and its users. It is unclear how an incentive 
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scheme will affect users’ anonymity and motivation to 
contribute: Acquisti et al. [6] discuss how differentiating 
users by performance may reduce anonymity while com- 
petition may reduce the sense of community and con- 
vince users that contributions are no longer warranted. 

New high-bandwidth relays may also be added by the 
Tor Project [4] or other organizations. While effective 
at improving network capacity, this approach is a short- 
term solution that does not scale. As Tor increases speed 
and bandwidth, it will likely attract more users. More 
signihcantly, it will attract more high-bandwidth and Bit- 
Torrent users, resulting in a Tragedy of the Commons [26] 
scenario: the bulk users attracted to the faster network 
will continue to leech the additional bandwidth. 
Optimizing Resource Utilization. Another approach to 
improving performance is to better utilize the available 
network resources. Tor’s path selection algorithm ig- 
nores the slowest small fraction of relays while selecting 
from the remaining relays in proportion to their available 
bandwidth. The path selection algorithm also ignores cir- 
cuits with long build times [12], removing the worst of 
bottlenecks and improving usability. Congestion-aware 
path selection [57] is another approach that aims to bal- 
ance load by using opportunistic and active client mea- 
surements while building paths. However, low band- 
width relays must still be chosen for circuits to mitigate 
anonymity problems, meaning there are still a large num- 
ber of circuits with tight bandwidth bottlenecks. 

Tang and Goldberg previously explored modihcations 
to the Tor circuit scheduler in order to prioritize bursty 
(i.e. web) traffic over bulk traffic using an exponentially- 
weighted moving average (EWMA) of relayed cells [52]. 
Early experiments show small improvements at a sin- 
gle relay, but full-network experiments indicate that 
the new scheduler has an insignihcant effect on perfor- 
mance [31]. It is unclear how performance is affected 
when deployed to the live Tor network. This schedul- 
ing approach attempts to shift network load to better uti- 
lize the available bandwidth, but does not reduce bottle- 
necks introduced by the massive amount of bulk traffic 
currently plaguing Tor. 

Reducing Load. All of the previously discussed ap- 
proaches attempt to increase performance, but none 
of them directly address or provide adequate defense 
against performance degradation problems created by 
bulk traffic clients. In this paper, we address these by 
adaptively throttling bulk data transfers at the client’s en- 
try into the Tor network. 

We emphasize that throttling is fundamentally differ- 
ent than scheduling, and the distinction is important in 
the context of the Tor network. Schedulers optimize the 
utilization of available bandwidth by following policies 
set by the network engineer, allowing the enforcement 
of fairness among flows (e.g. max-min fairness [24, 34] 


or proportional fairness [35]). However, throttling may 
under-utilize local bandwidth resources by intentionally 
imposing restrictions on clients’ throughput to reduce ag- 
gregate network load. 

By reducing bulk client throughput in Tor, we effec- 
tively reduce the bulk data transfer rate through the net- 
work, resulting in fewer bottlenecks and a less congested, 
more responsive Tor network that can better handle the 
burstiness of web traffic. Tor has recently implemented 
token buckets, a classic traffic shaping mechanism [55], 
to statically (non-adaptively) throttle client-to-guard con- 
nections at a given rate [17], but currently deployed con- 
hgurations of Tor do not enable throttling by default. Un- 
fortunately, the throttling algorithm implemented in Tor 
requires static conhguration of throttling parameters: the 
Tor network must determine network-wide settings that 
work well and update them as the network changes. Eur- 
ther, it is not possible to automatically tune each relay’s 
throttling conhguration with the current algorithm. 
Contributions. To the best of our knowledge, we are 
the hrst to explore throttling algorithms that adaptively 
adjust to the Huctuations and dynamics of Tor and each 
relay independently without the need to adjust parame- 
ters as the network changes. We also perform the hrst 
detailed investigation of the performance and anonymity 
implications of throttling Tor client connections. 

In Section 3, we introduce and test three algorithms 
that dynamically and adaptively throttle Tor clients us- 
ing a basic token bucket rate-limiter as the underlying 
throttling mechanism. Our new adaptive algorithms use 
local relay information to dynamically select which con- 
nections get throttled and to adjust the rate at which 
those connections are throttled. Adaptively tuned throt- 
tling mechanisms are paramount to our algorithm de- 
signs in order to avoid the need to re-evaluate parame- 
ter choices as network capacity or relay load changes. 
Our bit-splitting algorithm throttles each connection at 
an adaptively adjusted, but reserved and equal portion 
of a guard node’s bandwidth, ow:: flagging algorithm ag- 
gressively throttles connections that have historically ex- 
ceeded the statistically fair throughput, and our thresh- 
old algorithm throttles connections above a throughput 
quantile at a rate represented by that quantile. 

We implement our algorithms in Tor* and test their 
effectiveness at improving performance in large scale, 
full-network deployments. Section 4 compares our algo- 
rithms to static (non-adaptive) throttling under a varied 
range of network loads. We hnd that the effectiveness 
of static throttling is highly dependent on network load 
and conhguration whereas our adaptive algorithms work 
well under various loads with no conhguration changes 
or parameter maintenance: web client performance was 

'Software patches for our algorithms have been made publicly 
available to the community [5]. 
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Figure 1: A Tor relay’s internal architecture. 


improved for every parameter setting we tested. We con- 
clude that throttling is an effective approach to achieve a 
more responsive network. 

Having shown that our adaptive throttling algorithms 
provide significant performance benefits for web clients 
and have a profound impact on network responsiveness, 
Section 5 analyzes the security of our algorithms under 
adversarial attack. We discuss several realistic attacks on 
anonymity and compare the information leaked by each 
algorithm relative to unthrottled Tor. Against intuition, 
we find that throttling clients reduces information leak- 
age and improves network anonymity while minimizing 
the false positive impact on honest users. 


2 Background 

This section discusses Tor’s internal architecture, shown 
in Figure 1 , to facilitate an understanding of how internal 
processes affect client traffic flowing through a Tor relay. 
Multiplexed Connections. All relays in Tor commu- 
nicate using pairwise TCP connections, i.e. each relay 
forms a single TCP connection to each other relay with 
which it communicates. Since a pair of relays may be 
communicating data for several circuits at once, all cir- 
cuits between the pair are multiplexed over their single 
TCP connection. Each circuit may carry traffic for mul- 
tiple services or streams that a user may be accessing. 
TCP offers reliability, in-order delivery of packets be- 
tween relays, and potentially unfair kernel-level conges- 
tion control when multiplexing connections [47]. The 
distinction between and interaction of connections, cir- 
cuits, and streams is important for understanding Tor. 
Connection Input. Tor uses libevent [1] to handle input 
and output to and from kernel TCP buffers. Tor regis- 
ters sockets that it wants to read with libevent and con- 
figures a notification callback function. When data ar- 
rives at the kernel TCP input buffer (Figure la), libevent 
learns about the active socket through its polling in- 
terface and asynchronously executes the corresponding 


callback (Figure lb). Upon execution, the read callback 
determines read eligibility using token buckets. 

Token buckets are used to rate-limit connections. Tor 
fills the buckets as defined by configured bandwidth lim- 
its in one-second intervals while tokens are removed 
from the buckets as data is read, although changing that 
interval to improve performance is currently being ex- 
plored [53]. There is a global read bucket that limits 
bandwidth for reading from all connections as well as 
a separate bucket for throttling on a per-connection ba- 
sis (Figure Ic). A connection may ignore a read event 
if either the global bucket or its connection bucket is 
empty. In practice, the per-connection token buckets 
are only utilized for edge (non-relay) connections. Per- 
connection throttling reduces network congestion by pe- 
nalizing noisy connections, such as bulk transfers, and 
generally leads to better performance [17]. 

When a TCP input buffer is eligible for reading, a 
round-robin (RR) scheduling mechanism is used to read 
the smaller of 16 KiB and g of the global token bucket 
size per connection (Figure Id). This limit is imposed in 
an attempt at fairness so that a single connection can not 
consume all the global tokens on a single read. However, 
recent research shows that input/output scheduling leads 
to unfair resource allocations [54]. The data read from 
the TCP buffer is placed in a per-connection application 
input buffer for processing (Figure le). 

Flow Control. Tor uses an end-to-end flow control algo- 
rithm to assist in keeping a steady flow of cells through 
a circuit. Clients and exit relays constitute the edges of 
a circuit: each are both an ingress and egress point for 
data traversing the Tor network. Edges track data flow 
for both circuits and streams using cell counters called 
windows. An ingress edge decrements the correspond- 
ing stream and circuit windows when sending cells, stops 
reading from a stream when its stream window reaches 
zero, and stops reading from all streams multiplexed over 
a circuit when the circuit window reaches zero. Win- 
dows are incremented and reading resumes upon receipt 
of SENDME acknowledgment cells from egress edges. 
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By default, circuit windows are initialized to 1000 
cells (500 KiB) and stream windows to 500 cells (250 
KiB). Circuit SENDMEs are sent to the ingress edge af- 
ter the egress edge receives 100 cells (50 KiB), allowing 
the ingress edge to read, package, and forward 100 ad- 
ditional cells. Stream SENDMEs are sent after receiving 
50 cells (25 KiB) and allow an additional 50 cells. Win- 
dow sizes can have a signihcant effect on performance 
and recent work suggests an algorithm for dynamically 
computing them [7]. 

Cell Processing and Queuing. Data is immediately 
processed as it arrives in connection input buffers (Fig- 
ure If) and each cell is either encrypted or decrypted de- 
pending on its direction through the circuit. The cell is 
then switched onto the circuit corresponding to the next 
hop and placed into the circuit’s hrst-in-first-out (FIFO) 
queue (Figure Ig). Cells wait in circuit queues until the 
circuit scheduler selects them for writing. 

Scheduling. When there is space available in a con- 
nection’s output buffer, a relay decides which of sev- 
eral multiplexed circuits to choose for writing. Al- 
though historically this was done using round-robin, a 
new exponentially-weighted moving average (EWMA) 
scheduler was recently introduced into Tor [52] and is 
currently used by default (Figure Ih). EWMA records 
the number of packets it schedules for each circuit, expo- 
nentially decaying packet counts over time. The sched- 
uler writes one cell at a time chosen from the circuit with 
the lowest packet count and then updates the count. The 
decay means packets sent more recently have a higher 
influence on the count while bursty traffic does not sig- 
nificantly affect scheduling priorities. 

Connection Output. A cell that has been chosen 
and written to a connection output buffer (Figure li) 
causes an activation of the write event registered with 
libevent for that connection. Once libevent determines 
the TCP socket can be written, the write callback is asyn- 
chronously executed (Figure Ij). Similar to connection 
input, the relay checks both the global write bucket and 
per-connection write bucket for tokens. If the buckets 
are not empty, the connection is eligible for writing (Fig- 
ure Ik) and again is allowed to write the smaller of 16 
KiB and g of the global token bucket size per connection 
(Figure 11). The data is written to the kernel-level TCP 
buffer (Figure Im) and sent to the next hop. 

3 Throttling Client Connections 

Client performance in Tor depends heavily on the traf- 
fic patterns of others in the system. A small number of 
clients performing bulk transfers in Tor are the source 
of a large fraction of total network traffic [38]. The 
overwhelming load these clients place on the network 
increases congestion and creates additional bottlenecks. 



Figure 2: Throttling occurs at the connection between the 
client and guard to capture all streams to various destinations. 

causing interactive applications, such as instant messag- 
ing and remote SSH sessions, to lose responsiveness. 

This section explores client throttling as a mechanism 
to prevent bulk clients from overwhelming the network. 
Although a relay may have enough bandwidth to han- 
dle all traffic locally, bulk clients that continue producing 
additional traffic cause bottlenecks at other low-capacity 
relays. The faster a bulk downloader gets its data, the 
faster it will pull more into the network. Throttling bulk 
and other high-traffic clients prevents them from pushing 
or pulling too much data into the network too fast, reduc- 
ing these bottlenecks and improving performance for the 
majority of users. Therefore, interactive applications and 
Tor in general will become much more usable, attracting 
new users who improve client diversity and anonymity. 

We emphasize that throttling algorithms are not a re- 
placement for congestion control or scheduling algo- 
rithms, although each approach may cooperate to achieve 
a common goal. Scheduling algorithms are used to man- 
age the utilization of bandwidth, throttling algorithms re- 
duce the aggregate network load, and congestion con- 
trol algorithms attempt to do both. The distinction be- 
tween congestion control and throttling algorithms is 
subtle but important: congestion control reduces circuit 
load while attempting to maximize network utilization, 
whereas throttling reduces network load in an attempt to 
improve circuit performance by explicitly under-utilizing 
connections to bulk clients using too many resources. 
Each approach may independently affect performance, 
and they may be combined to improve the network. 

3.1 Static Throttling 

Recently, Tor introduced the functionality to allow entry 
guards to throttle connections to clients [17] (see Eig- 
ure 2). This client-to-guard connection is targeted be- 
cause all client traffic (using this guard) will flow over 
this connection regardless of the number of streams or 
the destination associated with each.*- The implemen- 
tation uses a token bucket for each connection in addi- 
tion to the global token bucket that already limits the to- 
tal amount of bandwidth used by a relay. The size of 
the per-connection token buckets can be specified us- 
ing the PerConnBWBurst configuration option, and 
the bucket refill rate can be specified by configuring the 
PerConnBWRate. The configured throttling rate en- 

^This work does not consider modified Tor clients. 
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sures that all client-to-guard connections are throttled 
to the specihed long-term-average throughput while the 
conhgured burst allows deviations from the throttling 
rate to account for bursty traffic. The conhguration op- 
tions provide a static throttling mechanism: Tor will 
throttle all connections using these values until directed 
otherwise. Note that Tor does not enable or conhgure 
static throttling by default. 

While static throttling is simple, it has two main draw- 
backs. First, static throttling requires constant monitor- 
ing and measurements of the Tor network to determine 
which conhgurations work well and which do not in or- 
der to be effective. We have found that there are many 
conhgurations of the algorithm that cause no change in 
performance, and worse, there are conhgurations that 
harm performance for interactive applications [33]. This 
is the opposite of what throttling is attempting to achieve. 
Second, it is not possible under the current algorithm 
to auto-tune the throttling parameters for each Tor relay. 
Conhgurations that appear to work well for the network 
as a whole might not necessarily be tuned for a given 
relay (we will show that this is indeed the case in Sec- 
tion 4). Each relay has very different capabilities and 
load patterns, and therefore may require different throt- 
tling conhgurations to be most useful. 

3.2 Adaptive Throttling 

Given the drawbacks of static throttling, we now explore 
and present three new algorithms that adaptively adjust 
throttling parameters according to local relay informa- 
tion. This section details our algorithms while Section 4 
explores their effect on client performance and Section 5 
analyzes throttling implications for anonymity. 

There are two main issues to consider when design- 
ing a client throttling algorithm: which connections to 
throttle and at what rate to throttle them. The approach 
discussed above in Section 3.1 throttles all client con- 
nections at the statically specihed rate. Each of our three 
algorithms below answers these questions adaptively by 
considering information local to each relay. Note that our 
algorithms dynamically adjust the PerConnBWRate 
while keeping a constant PerConnBWBurst.^ 
Bit-splitting. A simple approach to adaptive throttling 
is to split a guard’s bandwidth equally among all active 
client connections and throttle them all at this fair split 
rate. The PerConnBWRate will therefore be adjusted 
as new connections are created or old connections are 
destroyed: more connections will result in lower rates. 
No connection will be able to use more than its allot- 

^Our experiments [33] indicate that a 2 MiB burst is ideal as it al- 
lows directory requests to be downloaded unthrottled during bootstrap- 
ping while also throttling bulk traffic relatively quickly. The burst may 
need to be increased if the directory information grows beyond 2 MiB. 


Algorithm 1 Throttling clients by splitting bits. 
1: B getRelayBandwidth[) 

2: L getConnectionList{) 

3: N L.length{) 

4: if A > 0 then 
5: split Rate ^ ^ 

6: for / ^ 1 to A do 

7: a L[i].isClientConnection() then 

8: L[i\.throttleRate ^ splitRate 

9: end if 

10: end for 

11: end if 


ted share of bandwidth unless it has unused tokens in its 
bucket. Inspired by Quality of Service (QoS) work from 
communication networks [11,50,60], bit-splitting will 
prevent bulk clients from unfairly consuming bandwidth 
and ensure that there is a minimum “reserved” bandwidth 
for clients of all types. 

Note that Internet Service Providers employ similar 
techniques to throttle their customers, however, their 
client base is much less dynamic than the connections an 
entry guard handles. Therefore, our adaptive approach is 
more suitable to Tor. We include this algorithm in our 
analysis of throttling to determine what is possible with 
such a simple approach. 

Flagging Unfair Clients. The bit-splitting algorithm fo- 
cuses on adjusting the throttle rate and applying this to 
all client connections. Our next algorithm takes the op- 
posite approach: conhgure a static throttling rate and ad- 
just which connections get throttled. The intuition be- 
hind this approach is that if we can properly identify the 
connections that use too much bandwidth, we can throttle 
them in order to maximize the beneht we gain per throt- 
tled connection. Therefore, our bagging algorithm at- 
tempts to classify and throttle bulk traffic while it avoids 
throttling web clients. 

Since deep packet inspection is not desirable for pri- 
vacy reasons, and is not possible on encrypted Tor traffic, 
we instead draw upon existing statistical fingerprinting 
classihcation techniques [14,29,36] that classify traffic 
solely on its statistical properties. When designing the 
bagging algorithm, we recognize that Tor already con- 
tains a statistical throughput measure for scheduling traf- 
bc on circuits using an exponentially-weighted moving 
average (EWMA) of recently sent cells [52]. We can use 
the same statistical measure on client connections to clas- 
sify and throttle bulk traffic. 

The bagging algorithm, shown in Algorithm 2, re- 
quires that each guard keeps an EWMA of the number 
of recently sent cells per client connection. The per- 
connection cell EWMA is computed in much the same 
way as the per-circuit cell EWMA: whenever the cir- 
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Algorithm 2 Throttling clients by flagging bulk connections, 
considering a moving average of throughput. 

Algorithm 3 Throttling clients considering the loudest thresh 
old of connections. 

Require: flagRate.V.'H 

Require: T,TZ,F 

1 

B -1— getRelayBandwidth{) 

1 

L -f- getClientConnectionList{) 

2 

L ^ getConnectionList {) 

2 

A -i— L.lengthQ 

3 

N •(— L.length{) 

3 

if A > 0 then 

4 

A4 ■(— getMetaEW MA{) 

4 

select Index -i— floor{T ■ A) 

5 

if A > 0 then 

5 

L ^ reverseSortEW MA{L) 

6 

splitRate ^ ^ 

6 

thresholdRate -i— L[selectlndex]. 

7 

M Ai.increment{TL, splitRate) 


getMeanT hrough put (TZ) 

8 

for / ^ 1 to A do 

7 

if thresholdRate < T then 

9 

if L[i] .isClientConnection{) then 

8 

thresholdRate -i— A 

10 

it L\i]. EWMA > M then 

9 

end if 

11 

L[i].flag -<r- True 

10 

for lA— 1 to A do 

12 

L[i\.throttleRate -i— flagRate 

11 

if i < selectindex then 

13 

else if L[i\.flag = True A 

12 

L[i].throttleRate ^ thresholdRate 


L[i\.EWMA <V -Mititn 

13 

else 

14 

L[i\.flag -1— False 

14 

L[i\.throttleRate ^ infinity 

15 

L[i].throttleRate -l— infinity 

15 

end if 

16 

end if 

16 

end for 

17 

end if 

17 

end if 

18 

19 

end for 
end if 




long the algorithm remembers the amount of data a con- 
nection has transferred, and has precisely the same mean- 


cuit’s cell counter is incremented, so is the cell counter ing as the circuit priority half-life [52], Larger half-life 

of the connection to which that circuit belongs. Note values increase the ability to differentiate bulk from web 

that clients can not affect others’ per-connection EWMA connections while smaller half-life values make the algo- 

since all of a client’s circuits are multiplexed over a rithm more immediately reactive to throttling bulk con- 

single throttled guard-to-client connection."^ The per- nections. We would like to allow for a specihcation of 

connection EWMA is enabled and configured indepen- the length of each penalty once a connection is flagged 

dently of its circuit counterpart. in order to recover and stop throttling connections that 

We rely on the observation that bulk connections will may have been incorrectly flagged. Therefore, we intro- 

have higher EWMA values than web connections since duce a penalty fraction parameter V that affects how long 

bulk clients are steadily transferring data while web each connection remains in a flagged and throttled state, 

clients “think” between each page download. Using this If a connection’s cell count EWMA falls below V ■ Af , 

to our advantage, we can flag connections as containing its flag is removed and the connection is no longer throt- 

bulk traffic as follows. Each relay keeps a single sepa- tied. Einally, the rate at which each flagged connection is 

rate meta-EWMA M of cells transferred. M. is adjusted throttled, i.e. the FlagRate, is statically dehned and is 

by calculating the fair bandwidth split rate as in the bit- not adjusted by the algorithm. 

splitting algorithm, and tracking its EWMA over time. Note that the flagging parameters need only be set 
M does not coiTespond with any real traffic, but rep- based on system-wide policy and generally do not re- 
resents the upper bound of a connection-level EWMA quire independent relay tuning, but provides the flexi- 

if a connection were continuously sending only its fair bility to allow individual relay operators to deviate from 

share of traffic through the relay. Any connection whose system policy if they desire. 

EWMA exceeds M is flagged as containing bulk traffic Throttling Using Thresholds. Recall the two main is- 
and penalized by being throttled. ^ throttling algorithm must address: selecting which 

There are three main parameters for the algorithm. As connections to throttle and the rate at which to throttle 

mentioned above, a per-connection half-life U allows jbem. Our bit-splitting algorithm explored adaptively 

conhguration of the connection-level half-life indepen- adjusting the throttle rate and applying this to all con- 

dent of that used for circuit scheduling. U affects how „g^jjons while our flagging algorithm explored statically 

zz ^ ; , , , , . , , configuring a throttle rate and adaptively selecting the 

the same is not true tor the unthrottled connections between relays . . 

since each of them contain several circuits and each circuit may belong throttled connections. We now describe OUr hnal algO- 

to a different client (see Section 2). rithm which attempts to adaptively address both issues. 
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The threshold algorithm also makes use of a 
connection-level cell EWMA, which is computed as de- 
scribed above for the flagging algorithm. However, 
EWMA is used here to sort connections by the loudest 
to quietest. We then select and throttle the loudest frac- 
tion T of connections, where T is a configurable thresh- 
old. Eor example, setting T to 0. 1 means the loudest ten 
percent of client connections will be throttled. The se- 
lection is adaptive since the EWMA changes over time 
according to each connection’s bandwidth usage. 

We have adaptively selected which connections to 
throttle and now must determine a throttle rate. To do 
this, we require that each connection tracks its through- 
put over time. We choose the average throughput rate 
of the connection with the minimum EWMA from the 
set of connections being throttled. For example, when T 
= 0.1 and there are 100 client connections sorted from 
loudest to quietest, the chosen throttle rate is the average 
throughput of the tenth connection. Each of first ten con- 
nections is then throttled at this rate. In our prototype, 
we approximate the throughput rate as the average num- 
ber of bytes transferred over the last TZ seconds, where 
TZ is configurable. TZ represents the period of time be- 
tween which the algorithm re-selects the throttled con- 
nections, adjusts the throttle rates, and resets each con- 
nection’s throughput counters. 

There is one caveat to the algorithm as described 
above. In our experiments in Section 4, we noticed 
that occasionally the throttle rate chosen by the thresh- 
old algorithm was zero. This would happen if the mean 
throughput of the threshold connection (line 6 in Algo- 
rithm 3) did not send data over the last TZ seconds. To 
prevent a throttle rate of zero, we added a parameter to 
statically configure a throttle rate floor T so that no con- 
nection would ever be throttled below F. Algorithm 3 
details threshold adaptive throttling. 

4 Experiments 

In this section we explore the performance benefits possi- 
ble with each throttling algorithm specified in Section 3. 
We perform experiments with Shadow [2, 31], an accu- 
rate and efficient discrete event simulator that runs real 
Tor code over a simulated network. Shadow allows us to 
run an entire Tor network on a single machine and config- 
ure characteristics such as network latency, bandwidth, 
and topology. Since Shadow runs real Tor, it accurately 
characterizes application behavior and allows us to focus 
on experimental comparison of our algorithms. A direct 
comparison between Tor and Shadow-Tor performance 
is presented in [31]. 

Experimental Setup. Using Shadow, we configure a pri- 
vate Tor network with 200 HTTP servers, 950 Tor web 
clients, 50 Tor bulk clients, and 50 Tor relays. The dis- 


tribution of clients in our experiments approximates that 
found by McCoy et al. [38]. All of our nodes run inside 
the Shadow simulation environment. 

In our experiments, each client node runs Tor in client- 
only mode as well as an HTTP client application config- 
ured to download over Tor’s SOCKS proxy available on 
the local interface. Each web client downloads a 320 KiB 
flle^ from a randomly selected one of our HTTP servers, 
and pauses for a length of time drawn from the UNC 
“think time” data set [27] before downloading the next 
file. Each bulk client repeatedly downloads a 5 MiB file 
from a randomly selected HTTP server without pausing. 
Clients track the time to the first and the last byte of the 
download as indications of network responsiveness and 
overall performance. 

Tor relays are configured with bandwidth parameters 
according to a Tor network consensus document.® We 
configure our network topology and latency between 
nodes according to the geographical distribution of re- 
lays and pairwise PlanetLab node ping times. Our sim- 
ulated network mirrors a previously published Tor net- 
work model [31] that has been compared to and shown to 
closely approximate the load of the live Tor network [3]. 

We focus on the time to the first data byte for web 
clients as a measure of network responsiveness, and 
the time to the last data byte — the download time — for 
both web and bulk clients as a measure of overall per- 
formance. In our results, “vanilla” represents unmod- 
ified Tor using a round-robin circuit scheduler and no 
throttling — the default settings in the Tor software — and 
can be used to compare relative performance between 
experiments. Each experiment uses network-wide de- 
ployments of each configuration. To further reduce ran- 
dom variances, we ran all configurations five times each. 
Therefore, every curve on every CDF shows the cumula- 
tive results of five experiments. 

Results. Our results focus on the algorithmic config- 
urations that we found to maximize web client perfor- 
mance [33] while we show how the algorithms perform 
when the network load varies from light (25 bulk clients) 
to medium (50 bulk clients) to heavy (100 bulk clients). 
The experimental setup is otherwise unmodified from the 
model described above. Running the algorithms under 
various loads allows us to highlight the unique and novel 
features each provides. 

Figure 3 shows client performance for our algorithms. 
The time to first byte indicates network responsiveness 
for web clients while the download time indicates overall 
client performance for web and bulk clients. Client per- 
formance is shown for the lightly loaded network in Fig- 
ures 3a-3c, the normally loaded network in Figures 3d- 
3f, and the heavily loaded network in Figures 3g-3i. 

^The average webpage size reported by Google web metrics [45]. 

^Retrieved on 201 1-04-27 and valid from 03-06:00:00 
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(g) 320 KiB clients, heavy load (h) 320 KIB clients, heavy load (1) 5 MIB clients, heavy load 

Figure 3; Comparison of client performance for each throttling algorithm and vanilla Tor, under various load. All experiments use 
950 web clients. We vary the load between “light,” “medium,” and “heavy” by setting the number of bulk clients to 25 for 3a-3c, 
to 50 for 3d-3f, and to 100 for 3g-3i. The time to first byte indicates network responsiveness while the download time indicates 
overall client performance. The parameters for each algorithm are tuned based on experiments presented in [33]. 


Overall, static throttling results in the least amount of 
bulk traffic throttling while providing the lowest bene- 
fit to web clients. For the bit-splitting algorithm, we 
see improvements over static throttling for web clients 
for both time to first byte and overall download times, 
while download times for bulk clients are also slightly 
increased. Flagging and threshold throttling perform 
somewhat more aggressive throttling of bulk traffic and 
therefore also provide the greatest improvements in web 
client performance. 

We find that each algorithm is effective at throttling 
bulk clients independent of network load, as evident in 
Figures 3c, 3f and 3i. However, performance benefits for 
web clients vary slightly as the network load changes. 
When the number of bulk clients is halved, throughput 
in Figure 3b is fairly similar across algorithms. How- 


ever, when the number of bulk clients is doubled, re- 
sponsiveness in Figure 3g and throughput in Figure 3h 
for both the static throttling and the adaptive bit-splitting 
algorithm lag behind the performance of the flagging and 
threshold algorithms. Static throttling would likely re- 
quire a reconfiguration of throttling parameters while bit- 
splitting adjusts the throttle rate less effectively than our 
flagging and threshold algorithms. 

As seen in Figures 3a, 3d, and 3g, as the load 
changes, the strengths of each algorithm become appar- 
ent. The flagging and threshold algorithms stand out as 
the best approaches for both web client responsiveness 
and throughput, and Figures 3c, 3f, and 3i show that 
they are also most aggressive at throttling bulk clients. 
The flagging algorithm appears very effective at accu- 
rately classifying bulk connections regardless of network 
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vanilla 

static 

split 

flag 

thresh 


Data (GiB) 

88.3 

80.3 

78.3 

72.1 

69.8 

§ 

Web (%) 

74.5 

83.7 

85.9 

92.7 

90.1 


Bulk (%) 

25.5 

16.3 

14.1 

7.3 

9.9 


Data (GiB) 

92.2 

88.6 

84.7 

77.7 

76.3 


Web (%) 

65.8 

72.4 

75.0 

86.2 

82.8 


Bulk (%) 

34.2 

27.6 

25.0 

13.8 

17.2 


Data (GiB) 

94.7 

91.1 

85.0 

81.7 

85.0 

D 

Web (%) 

55.8 

60.5 

64.3 

75.4 

71.2 


Bulk (%) 

44.2 

39.5 

35.7 

24.6 

28.8 


Table 1; Total data downloaded in our simulations by client 
type. Throttling reduces the bulk traffic share of the load on the 
network. The flagging algorithm is the best at throttling hulk 
traffic under light, medium, and heavy loads of 25, 50, and 100 
bulk clients, respectively. 

load. The threshold algorithm maximizes web client per- 
formance in our simulations among all loads and all al- 
gorithms tested, since it effectively throttles the worst 
bulk clients while utilizing extra bandwidth when possi- 
ble. Both the threshold and flagging algorithms perform 
well over all network loads tested, and their usage in Tor 
would require little-to-no maintenance while providing 
significant performance improvements for web clients. 

Aggregate download statistics are shown in Table 1. 
The results indicate that we are approximating the load 
distribution measured by McCoy et al. [38] reasonably 
well. The data also indicates that as the number of 
bulk clients in our simulation increases, so does the total 
amount of data downloaded and the bulk fraction of the 
total as expected. The data also shows that all throttling 
algorithms reduce the total network load. Static throt- 
tling reduces load the least, while our adaptive flagging 
algorithm is both the best at reducing both overall load 
and the bulk percentage of network traffic. Each of our 
adaptive algorithms are better at reducing load than static 
throttling, due to their ability to adapt to network dynam- 
ics. The relative difference between each algorithm’s ef- 
fectiveness at reducing load roughly corresponds to the 
relative difference in web client performance in our ex- 
periments, as we discussed above. 

Discussion. The best algorithm for Tor depends on mul- 
tiple factors. Although not maximizing web client per- 
formance, bit-splitting is the simplest, the most efficient, 
and the most network neutral approach (every connec- 
tion is allowed the same portion of a guard’s capacity). 
This “subtle” or “delicate” approach to throttling may be 
favorable if supporting multiple client behaviors is de- 
sirable. Conversly, the flagging algorithm may be used 
to identify a specific class of traffic and throttle it ag- 
gressively, creating the potential for the largest increase 
in performance for unthrottled traffic. We are currently 
exploring improvements to our statistical classification 
techniques to reduce false positives and to improve the 


control over traffic of various types. For these reasons, 
we feel the bit-splitting and flagging algorithms will be 
the most useful in various situations. We suggest that 
perhaps bit-splitting is the most appropriate throttling al- 
gorithm to use initially, even if something more aggres- 
sive is desirable in the long term. 

While requiring little maintenance, our algorithms 
were designed to use only local relay information. 
Therefore, they are incrementally deployable while re- 
lay operators may choose the desired throttling algorithm 
independent of others. Our algorithms are already imple- 
mented in Tor and software patches are available [5]. 

5 Analysis and Discussion 

Having shown the performance benefits of throttling bulk 
clients in Section 4, we now analyze the security of 
throttling against adversarial attacks on anonymity. We 
will discuss the direct impact of throttling on anonymity; 
what an adversary can learn when guards throttle clients 
and how the information leaked affects the anonymity of 
the system. We lastly discuss potential strategies clients 
may use to elude the throttles. 

Before exploring practical attacks, we introduce two 
techniques an adversary may use to gather information 
about the network given that a generic throttling algo- 
rithm is enabled at all guards. Similar techniques used 
for throughput-based traffic analysis outside the context 
of throttling are discussed in detail by Mittal et al. [39]. 
Discussion about the security of our throttling algorithms 
in the context of practical attacks will follow. 

5.1 Gathering Information 

Our analysis uses the following terminology. At time t, 
the throughput of a connection between a client and a 
guard is the rate at which the client will be throttled is 
at, and the allowed data burst is j3. Note that, as consis- 
tent with our algorithms, the throttle rate may vary over 
time but the burst is a static system-wide parameter. 
Probing Guards. Using the above terminology, a con- 
nection is throttled if, over the last s seconds, its through- 
put exceeds the allowed initial burst and the long-term 
throttle rate: 

E (A,)>^+ t («a) (1) 

k=t-s k=t-s 

A client may perform a simple technique to probe a spe- 
cific guard node and determine the rate at which it gets 
throttled. The client may open a single circuit through 
the guard, selecting other high-bandwidth relays to en- 
sure that the circuit does not contain a bottleneck. Then, 
it may download a large file and observe the change in 
throughput after receiving a burst of j3 payload bytes. 
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Figure 4: 4a: Client’s may discover the throttle rate by probing guards. 4b; Information leaked by learning circuit throughputs. 
4c: Information leaked by learning guards’ throttle rates. 


If the first /3 bytes are received at time t\ and the 
download finishes at time ?2 > f i , the throttle rate at any 
time t in this interval can be approximated by the mean 
throughput leading up to f: 

( 2 ) 

t — 1\ 

Therefore, approximates the actual throttle rate. Note 
that this approximation may under-estimate the actual 
throttle rate if the throughput falls below the throttle rate 
during the measured interval. 

We simulate probing in Shadow [2, 31] to show its ef- 
fectiveness against the static throttling algorithm. As ap- 
parent in Figure 4a, the throttle rate was configured at 5 
KiB/s and the burst at 2 MiB. With enough resources, an 
adversary may probe every guard node to form a com- 
plete list of throttle rates. 

Testing Circuit Throughput. A web server may deter- 
mine the throughput of a connecting client’s circuit by 
using a technique similar to that presented by Hopper 
et al. [30]. When the server gets an HTTP request from 
a client, it may inject either special JavaScript or a large 
amount of garbage HTML into a form element included 
in the response. The injected code will trigger a second 
client request after the original response is received. The 
server may adjust the amount of returned data and mea- 
sure the time between when it sent the first response and 
when it received the second request to approximate the 
throughput of the circuit. 

5.2 Adversarial Attacks 

We now explore several adversarial attacks in the con- 
text of client throttling algorithms, and how an adversary 
may use those attacks to learn information and affect the 
anonymity of a client. 

Attack 1. In our first attack, an adversary obtains a dis- 
tribution on throttle rates by probing all Tor guard relays. 


We assume the adversary has resources to perform such 
an attack, e.g. by utilizing a botnet or other distributed 
network such as PlanetLab [13]. The adversary then ob- 
tains access to a web server and tests the throughput of a 
target circuit. With this information, the adversary may 
reduce the anonymity set of the circuit’s potential guards 
by eliminating those whose throttle rate is inconsistent 
with the measured circuit throughput. 

This attack is somewhat successful against all of 
the throttling algorithms we have described. For bit- 
splitting, the anonymity set of possible guard nodes will 
consist of those whose bandwidth and number of active 
connections would throttle to the throughput of the target 
circuit or higher. By running the attack repeatedly over 
time, an intersection will narrow the set to those whose 
throttle rate is consistent with the target circuit through- 
put at all measured times. 

The flagging algorithm throttles all flagged connec- 
tions to the same rate system-wide. (We assume here 
that the set of possible guards is already narrowed to 
those whose bandwidth is consistent with the target cir- 
cuit’s throughput irrespective of throttling.) A circuit 
whose throughput matches the system-wide rate is either 
flagged at some guard or just coincidentally matches the 
system-wide rate and is not flagged because its EWMA 
has remained below the splitRate (see Algorithm 2) 
for its guard long enough to not be flagged or become 
unflagged. The throttling rate is thus not nearly as infor- 
mative as for bit-splitting. If we run the attack repeatedly 
however, we can eliminate from the anonymity set any 
guard such that the EWMA of the target circuit should 
have resulted in a throttling but did not. Also, if the 
EWMA drops to the throttling rate at precise times (ig- 
noring unusual coincidence), we can eliminate any guard 
that would not have throttled at precisely those times. 
Note that this determination must be made after the fact 
to account for the burst bucket of the target circuit, but it 
can still be made precisely. 
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The potential for information going to the attacker in 
the threshold algorithm is a combination of the potential 
in each of the above two algorithms. The timing of when 
a circuit gets throttled (or does not when it should have 
been) can narrow the anonymity set of entry guards as in 
the flagging algorithm. Once the circuit has been throt- 
tled, then any fluctuation in the throttling rate that sepa- 
rates out the guard nodes can be used to further narrow 
the set. Note that if a circuit consistently falls below the 
throttling rate of all guards, an attacker can learn nothing 
about its possible entry guard from this attack. Attack 2 
considerably improves the situation for the adversary. 

We simulated this attack in Shadow [2, 31]. An ad- 
versary probes all guards and forms a distribution on the 
throttle rate at which a connection would become throt- 
tled. We then form a distribution on circuit throughputs 
over each minute, and remove any guard whose throttle 
rate is outside a range of one standard deviation of those 
throughputs. Since there are 50 guards, the maximum 
entropy is log2(50) « 5.64; the entropy lost by this at- 
tack for various throttling algorithms relative to vanilla 
Tor is shown in Figure 4b. We can see that the static 
algorithm actually loses no information, since all con- 
nections are throttled to the same rate, while vanilla Tor 
without throttling actually loses more information than 
any of the throttling algorithms. Therefore, the distri- 
bution on guard bandwidth leaks more information than 
throttled circuits’ throughputs. 

Attack 2. As in Attack 1, the adversary again obtains 
a distribution on throttle rates of all guards in the sys- 
tem. However, the adversary slightly modifies its circuit 
testing by continuously sending garbage responses. The 
adversary adjusts the size of each response so that it may 
compute the throughput of the circuit over time and ap- 
proximates the rate at which the circuit is throttled. By 
comparing the estimated throttle rate to the distribution 
on guard throttle rates, the adversary may again reduce 
the anonymity set by removing guards whose throttle rate 
is inconsistent with the estimated circuit throttle rate. 

For bit-splitting, by raising and lowering the rate of 
garbage sent, the attacker can match this with the throt- 
tled throughput of each guard. The only guards in the 
anonymity set would be those that share the same throt- 
tling rate that matches the flooded circuit’s throughput 
at all times. To maximize what he can learn from flag- 
ging, the adversary should raise the EWMA of the target 
circuit at a rate that will allow him to maximally differ- 
entiate guards with respect to when they would begin to 
throttle a circuit. If this does not uniquely identify the 
guard, he can also use the rate at which he diminishes 
garbage traffic to try to learn more from when the tar- 
get circuit stops being throttled. As in Attack 1 from the 
threshold algorithm, the adversary can match the signa- 
ture of both fluctuations in throttling rate over time and 


the timing of when throttling is applied to narrow the set 
of possible guards for a target circuit. 

We simulated this attack using the same data set as 
Attack 1. Figure 4c shows that a connection’s throttle 
rate generally leaks slightly more information than its 
throughput. As in Attack 1, guards’ bandwidth in our 
simulation leaks more information than the throttle rate 
of each connection for all but the flagging algorithm. 
Attack 3. An adversary controlling two malicious 
servers can link streams of a client connecting to each 
of them at the same time. The adversary uses the circuit 
testing technique to send a response of ^ bytes in size to 
each of two requests. Then, small “test” responses are re- 
turned after receiving the clients’ second requests. If the 
throughput of each circuit when downloading the “test” 
response is consistently throttled, then it is possible that 
the requests are coming from the same client. This at- 
tack relies on the observation that all traffic on the same 
client-to-guard connection will be throttled at the same 
time since each connection has a single burst bucket. 

This attack is intended to indicate if and when a circuit 
is throttled, rather than the throttling rate. It will there- 
fore not be effective against bit splitting, but will work 
against flagging or threshold throttling. 

Attack 4. Our final attack is an active denial of service 
attack that can be used to confirm a circuit’s entry guard 
with high probability. In this attack, the adversary at- 
tempts to adjust the throttle rate of each guard in order 
to identify whether it carries a target circuit. An adver- 
sary in control of a malicious server may monitor the 
throughput of a target circuit over time, and may then 
open a large number of connections to each guard node 
until a decrease in the target circuit’s throughput is ob- 
served. To confirm that a guard is on the target circuit, 
the adversary can alternate between opening and closing 
guard connections and continue to observe the through- 
put of the target circuit. If the throughput is consistent 
with the adversary’s behavior, it has found the circuit’s 
guard with high probability. 

The one thing not controlled by the adversary in 
Attack 2 is a guard’s criterion for throttling at a 
given time - splitRate for bit splitting and flagging 
and select Index for threshold throttling (see Algo- 
rithms 1, 2, and 3). All of these are controlled by the 
number of circuits at the guard, which Attack 4 places 
under the control of the adversary. Thus, under Attack 4, 
the adversary will have precise control over which cir- 
cuits get throttled at which rate at all times and can there- 
fore uniquely determine the entry guard. 

Note that all of Attacks 1, 2, and 4 are intended to 
learn about the possible entry guards for an attacked cir- 
cuit. Even if completely successful, this does not fully 
de-anonymize the circuit. But since guards themselves 
are chosen for persistent use by a client, they can add 
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to pseudonymous profiling and can be combined with 
other information, such as that uncovered by Attack 3, 
to either reduce anonymity of the client or build a richer 
pseudonymous profile of it. 

5.3 Eluding Throttles 

A client may try multiple strategies to avoid being throt- 
tled. A client may instrument its downloading applica- 
tion and the Tor software to send application data over 
multiple Tor circuits. However, these circuits will still be 
subject to throttling since each of them uses the same 
throttled TCP connection to the guard. A client may 
avoid this by attempting to create multiple TCP con- 
nections to the guard. In this case, the guard may eas- 
ily recognize that the connection requests come from 
the same client and can either deny the establishment 
of multiple connections or aggregate the accounting of 
all connections to that client. A client may use multi- 
ple guard nodes and send application data over each sep- 
arate guard connection, but the client significantly de- 
creases its anonymity by subverting the guard mecha- 
nism [58,59]. Finally, the client could run and use its 
own guard node and avoid throttling itself. Although this 
strategy may actually benefit the network since it reduces 
the amount of Tor’s capacity consumed by the client, the 
cost of running a guard may be sufficient to prevent its 
wide-scale adoption. 

Its important to note that the “cheating” techniques 
outlined above do not decrease the security or perfor- 
mance below what unthrottled Tor provides. At worst, 
even if all clients somehow manage to elude the throttles, 
performance and security both regress to that of unthrot- 
tled Tor. In other words, throttling can only improve the 
situation whether or not “cheating” occurs in practice. 

6 Related Work 

6.1 Improving Tor’s Performance 

Recent work on improving Tor’s performance covers a 
wide range of topics, which we now enumerate. 
Incentives. A recognition that Tor is limited by its band- 
width resources has resulted in several proposals for de- 
veloping performance incentives for volunteering band- 
width as a Tor relay. New relays would provide ad- 
ditional resources and improve network performance. 
Ngan et al. explore giving better performance to re- 
lays that attain the fast and stable relay flags [43]. 
These relays are marked with a “gold star” in the di- 
rectory. Gold star relays may build circuits through 
other gold star relays, improving download performance. 
This scheme has a severe anonymity problem: any relay 


on a gold star circuit can determine with absolute cer- 
tainty that the client is also a gold star relay. Jansen 
et al. explore reducing anonymity problems from the 
gold star approach by distributing anonymous tickets to 
all clients [32]. Relays then collect tickets from clients in 
exchange for prioritized service and can prioritize their 
own traffic in return. However, a centralized bank lim- 
its the allowable number of tickets in circulation, leading 
to spending strategies that may reduce anonymity. Fi- 
nally, Moore et al. independently explored using static 
throttling configurations as a way to produce incentives 
for users to run relays in Tortoise [41]. Tortoise’s throt- 
tling configurations must be monitored as network load 
changes, and anonymity with Tortoise is slightly worse 
than with the gold star scheme: the intersection attack 
is improved since gold star nodes retain their gold stars 
for several months after dropping from the consensus, 
whereas Tortoise only unthrottles nodes that are in the 
current consensus. 

Relay Selection. Snader and Borisov [51] suggest an 
algorithm where relays opportunistically measure their 
peers’ performance, allowing clients to use empirical ag- 
gregations to select relays for their circuits. A user- 
tunable mechanism for selecting relays is built into the 
algorithm: clients may adjust how often the fast re- 
lays get chosen, trading off anonymity and performance 
while not significantly reducing either. It was shown 
that this approach increases accuracy of available band- 
width estimates and reduces reaction time to changes 
in network load while decreasing vulnerabilities to low- 
resource routing attacks. Wang et al. [57] propose a 
congestion-aware path selection algorithm where clients 
choose paths based on information gathered during op- 
portunistic and active measurements of relays. Clients 
use latency as an indication of congestion, and reject con- 
gested relays when building circuits. Improvements were 
realized for a single client, but its unclear how the new 
strategy would affect the network if used by all clients. 
Scheduling. Alternative scheduling approaches have re- 
cently gained interest. Tang and Goldberg [52] sug- 
gest each relay track the number of packets it sched- 
ules for each circuit. After a configurable time-period, 
packet counts are exponentially decayed so that data 
sent more recently has a greater influence on the packet 
count. For each scheduling decision, the relay flushes 
the circuit with the lowest cell count, favoring circuits 
that have not sent much data recently while preventing 
bursty traffic from significantly affecting scheduling pri- 
orities. Jansen et al. [32] investigate new schedulers 
based on the proportional differentiation model [21] and 
differentiable service classes. Relays track the delay of 
each service class and prioritize scheduling so that rel- 
ative delays are proportional to configurable differenti- 
ation parameters, but the schedulers require a mecha- 
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nism (tickets) for differentiating traffic into classes. Fi- 
nally, Tor’s round-robin TCP read/write schedulers have 
recently been noted as a source of unfairness for relays 
that have an unbalanced number of circuits per TCP con- 
nection [54]. Tschorsch and Scheuermann suggest that a 
round-robin scheduler could approximate a max-min al- 
gorithm [24] by choosing among all circuits rather than 
all TCP connections. More work is required to determine 
the suitability of this approach in Tor. 

Congestion. Improving performance and reducing con- 
gestion has been studied by taking an in-depth look at 
Tor’s circuit and stream windows [7]. AlSabah et al. ex- 
periment with dynamically adjusting window sizes and 
hnd that smaller window sizes effectively reduce queuing 
delays, but also decrease bandwidth utilization and there- 
fore hurt overall download performance. As a result, they 
implement and test an algorithm from ATM networks 
called the N23 scheme, a link-by-link flow control al- 
gorithm. Their adaptive N23 algorithm propagates infor- 
mation about the available queue space to the next up- 
stream router while dynamically adjusting the maximum 
circuit queue size based on outgoing cell buffer delays, 
leading to a quicker reaction to congestion. Their experi- 
ments indicate slightly improved response and download 
times for 300 KiB hies. 

Transport. Tor’s performance has also been analyzed 
at the socket level, resulting in suggestions for a UDP- 
based mechanism for data delivery [56] or using a user- 
level TCP stack over a DTLS tunnel [47]. While Tor cur- 
rently multiplexes all circuits over a single kernel TCP 
stream to control information leakage, the TCP-over- 
DTLS approach suggests separate user TCP streams for 
each circuit and sends all TCP streams between two re- 
lays over a single kernel DTLS-secured [40] UDP socket. 
As a result, a circuit’s TCP window is not unfairly re- 
duced when other high-bandwidth circuits cause queuing 
delays or dropped packets. 

6.2 Bandwidth Management 

Our approach to bandwidth management in this paper 
has been to use a token bucket rate-limiter, a classic traf- 
hc shaping mechanism [55], to ensure that traffic con- 
forms to the desired policies. We now briehy discuss 
other approaches to bandwidth management. 

Quality of Service. Networks often want to provide 
a certain quality of service (QoS) to their subscribers. 
There are two main approaches to QoS: Integrated Ser- 
vices (IntServ) and Differentiated Services (DiffServ). 

In the IntServ [11,50] model, applications request re- 
sources from the network using the resource reservation 
protocol [60]. Since the network must maintain the ex- 
pected quality for its current commitments, it must en- 
sure the load of the network remains below a certain 


level. Therefore, new requests may be denied if the net- 
work is unable to provide the resources requested. This 
approach does not work well in an anonymity network 
like Tor since clients would be able to request unbounded 
resources without accountability and the network would 
be unable to fulhll most requests due to bottlenecks. 

In the DiffServ [9] model, applications notify the net- 
work of the desired service type by setting bits in the IP 
header. Routers then tailor performance toward an ex- 
pected notion of fairness (e.g. max-min fairness [24, 34] 
or proportional fairness [20,21,35]). Leaking this type of 
information about a client’s traffic flows is a significant 
risk to privacy and ways to provide differentiated service 
without such risk do not currently exist. 

Scheduling. Scheduling algorithms, such as fair queu- 
ing [15] and round robin [24, 25], affect the order in 
which packets are sent out of a given node, but gen- 
erally do not change the total number of packets being 
sent. Therefore, unless the sending rate is explicitly re- 
duced, the network will still contain similar load regard- 
less of the relative priority of individual packets. As ex- 
plained in Section 1 and Section 3, scheduling does not 
directly reduce network congestion, but may cooperate 
with other bandwidth management techniques to achieve 
the desired performance characteristics of traffic classes. 

7 Conclusion 

This paper analyzes client throttling by guard relays to 
reduce Tor network bottlenecks and improve responsive- 
ness. We explore static throttling configurations while 
designing, implementing, and evaluating three new throt- 
tling algorithms that adaptively select which connections 
get throttled and dynamically adjust the throttle rate of 
each connection. Our adaptive throttling techniques use 
only local relay information and are considerably more 
effective than static throttling since they do not require 
re-evaluation of throttling parameters as network load 
changes. We And that client throttling is effective at 
both improving performance for interactive clients and 
increasing Tor’s network resilience. We also analyzed 
the effects throttling has on anonymity and discussed the 
security of our algorithms against realistic adversarial at- 
tacks. We And that throttling improves anonymity: a 
guard’s bandwidth leaks more information about its cir- 
cuits when throttling is disabled. 

Future Work. There are many directions for future re- 
search. Our current algorithms may be modified to op- 
timize performance by improving classification of bulk 
traffic, considering alternative strategies for distinguish- 
ing web from bulk connections. Additional approaches 
to rate-tuning are also of interest, e.g. it may be possi- 
ble to further improve web client performance using pro- 
portional fairness to schedule traffic on circuits. Also of 
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interest is an analysis of throttling in the context of con- 
gestion and flow control to determine the interi'elation 
and effects the algorithms have on each other. Finally, a 
deeper understanding of our algorithms and their effects 
on client performance would be possible through analy- 
sis on the live Tor network. 
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